Search for: All records

Creators/Authors contains: "Yoder, Matthew"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TaxonWorks: Character state matrices and identification tools

https://doi.org/10.3897/biss.5.75554

Dmitriev, Dmitry; Yoder, Matthew (September 2021, Biodiversity Information Science and Standards)

TaxonWorks is an integrated web-based application for practicing taxonomists and biodiversity specialists. It is focused on promoting collaboration between researchers and developers. TaxonWorks has a modular structure that enables various components of the application to target specific needs and requirements of different groups of users. Specific areas of interest may include nomenclature-related tasks (Yoder and Dmitriev 2021) designed to help assemble and validate scientific name checklists of a target group of organisms; and collection management tasks, including interfaces to create, filter, and edit collecting events, collection objects, and loans. This presentation focuses on matrix-related tools integrated into TaxonWorks. A matrix, which could either be used for phylogenetic analysis or to build an identification key, is structured as a table where columns represent numerous characters that could be used to describe a set of entities, taxa or specimens (presented as rows of the table). Each cell of the table may contain observations for specific character/entity combinations. TaxonWorks does not generate a table for each a particular matrix—all observations are stored as graphs. This structure allows building of a matrix of an unlimited size as well as reuse of individual observations in multiple matrices. For matrix columns, TaxonWorks supports a variety of different kinds of characters or descriptors: qualitative, presence/absence, quantitative, sample, gene, free text, and media. Each character may have specific properties, for example a qualitative descriptor may have numerous characters states, and a quantitative descriptor may have a measurement unit defined. For an entity in a matrix row, TaxonWorks supports either collection objects (specimens) or taxa as Operational Taxonomic Units (OTU). OTUs could either be linked to nomenclature or be stand alone entities (e.g., representing undescribed species). The matrix, once built, could serve several purposes. A matrix based on qualitative and quantitative characters could be used to build an interactive key (Fig. 1), construct standardized natural language descriptions for each entity, and determine a diagnosis (a minimal set of characters that separate one entity from all others). It could also be exported and used for phylogenetic analysis or to build an interactive key in an external application. TaxonWorks supports export files in several formats, including Nexus, TNT, NeXML. Application Programming Interfaces (API) are also available. A matrix based on media descriptors could be used as a pictorial identification tool (Fig. 2).
more » « less
Full Text Available
Nomenclature over 5 years in TaxonWorks: Approach, implementation, limitations and outcomes

https://doi.org/10.3897/biss.5.75441

Yoder, Matthew; Dmitriev, Dmitry (September 2021, Biodiversity Information Science and Standards)

We are now over four decades into digitally managing the names of Earth's species. As the number of federating (i.e., software that brings together previously disparate projects under a common infrastructure, for example TaxonWorks) and aggregating (e.g., International Plant Name Index, Catalog of Life (CoL)) efforts increase, there remains an unmet need for both the migration forward of old data, and for the production of new, precise and comprehensive nomenclatural catalogs. Given this context, we provide an overview of how TaxonWorks seeks to contribute to this effort, and where it might evolve in the future. In TaxonWorks, when we talk about governed names and relationships, we mean it in the sense of existing international codes of nomenclature (e.g., the International Code of Zoological Nomenclature (ICZN)). More technically, nomenclature is defined as a set of objective assertions that describe the relationships between the names given to biological taxa and the rules that determine how those names are governed. It is critical to note that this is not the same thing as the relationship between a name and a biological entity, but rather nomenclature in TaxonWorks represents the details of the (governed) relationships between names. Rather than thinking of nomenclature as changing (a verb commonly used to express frustration with biological nomenclature), it is useful to think of nomenclature as a set of data points, which grows over time. For example, when synonymy happens, we do not erase the past, but rather record a new context for the name(s) in question. The biological concept changes, but the nomenclature (names) simply keeps adding up. Behind the scenes, nomenclature in TaxonWorks is represented by a set of nodes and edges, i.e., a mathematical graph, or network (e.g., Fig. 1). Most names (i.e., nodes in the network) are what TaxonWorks calls "protonyms," monomial epithets that are used to construct, for example, bionomial names (not to be confused with "protonym" sensu the ICZN). Protonyms are linked to other protonyms via relationships defined in NOMEN, an ontology that encodes governed rules of nomenclature. Within the system, all data, nodes and edges, can be cited, i.e., linked to a source and therefore anchored in time and tied to authorship, and annotated with a variety of annotation types (e.g., notes, confidence levels, tags). The actual building of the graphs is greatly simplified by multiple user-interfaces that allow scientists to review (e.g. Fig. 2), create, filter, and add to (again, not "change") the nomenclatural history. As in any complex knowledge-representation model, there are outlying scenarios, or edge cases that emerge, making certain human tasks more complex than others. TaxonWorks is no exception, it has limitations in terms of what and how some things can be represented. While many complex representations are hidden by simplified user-interfaces, some, for example, the handling of the ICZN's Family-group name, batch-loading of invalid relationships, and comparative syncing against external resources need more work to simplify the processes presently required to meet catalogers' needs. The depth at which TaxonWorks can capture nomenclature is only really valuable if it can be used by others. This is facilitated by the application programming interface (API) serving its data (https://api.taxonworks.org), serving text files, and by exports to standards like the emerging Catalog of Life Data Package. With reference to real-world problems, we illustrate different ways in which the API can be used, for example, as integrated into spreadsheets, through the use of command line scripts, and serve in the generation of public-facing websites. Behind all this effort are an increasing number of people recording help videos, developing documentation, and troubleshooting software and technical issues. Major contributions have come from developers at many skill levels, from high school to senior software engineers, illustrating that TaxonWorks leads in enabling both technical and domain-based contributions. The health and growth of this community is a key factor in TaxonWork's potential long-term impact in the effort to unify the names of Earth's species.
more » « less
Full Text Available
Self-publishing Biodiversity Data Products on the Web

https://doi.org/10.3897/biss.6.94061

Yoder, Matthew; Pereira, José Luis; Pereira, Hernán; Dmitriev, Dmitry; DeWalt, Ralph; Cigliano, Maria-Marta; Paul, Deborah L; Flood, James (September 2022, Biodiversity Information Science and Standards)

Biodiversity informatics workbenches and aggregators that make their data externally accessible via application programming interfaces (APIs) facilitate the development of customized applications that fit the needs of a diverse range of communities. In the past, the technical skills required to host web-facing applications placed constraints on many researchers: they either needed to find technical help, or expand their own skills. These limits are now significantly reduced when free or low-cost web-site hosting is combined with small, well-documented applications that require minimal configuration to setup. We illustrate two applications that take advantage of this approach: an interactive key engine (presently named "distinguish") and TaxonPages, a taxon page service application. Both applications make use of TaxonWorks' API. We discuss the limits, e.g., the user must be online to access the data behind the application, and advantages of this approach, e.g., the application server can be served locally, on the users' own computer, and the underlying data are all accessible in more technical formats.
more » « less
Full Text Available
Formalizing Invertebrate Morphological Data: A Descriptive Model for Cuticle-Based Skeleto-Muscular Systems, an Ontology for Insect Anatomy, and their Potential Applications in Biodiversity Research and Informatics

https://doi.org/10.1093/sysbio/syad025

Girón, Jennifer C; Tarasov, Sergei; González Montaña, Luis Antonio; Matentzoglu, Nicolas; Smith, Aaron D; Koch, Markus; Boudinot, Brendon E; Bouchard, Patrice; Burks, Roger; Vogt, Lars; et al (April 2023, Systematic Biology)

Abstract The spectacular radiation of insects has produced a stunning diversity of phenotypes. During the past 250 years, research on insect systematics has generated hundreds of terms for naming and comparing them. In its current form, this terminological diversity is presented in natural language and lacks formalization, which prohibits computer-assisted comparison using semantic web technologies. Here we propose a Model for Describing Cuticular Anatomical Structures (MoDCAS) which incorporates structural properties and positional relationships for standardized, consistent, and reproducible descriptions of arthropod phenotypes. We applied the MoDCAS framework in creating the ontology for the Anatomy of the Insect Skeleto-Muscular system (AISM). The AISM is the first general insect ontology that aims to cover all taxa by providing generalized, fully logical, and queryable, definitions for each term. It was built using the Ontology Development Kit (ODK), which maximizes interoperability with Uberon (Uberon multi-species anatomy ontology) and other basic ontologies, enhancing the integration of insect anatomy into the broader biological sciences. A template system for adding new terms, extending, and linking the AISM to additional anatomical, phenotypic, genetic, and chemical ontologies is also introduced. The AISM is proposed as the backbone for taxon-specific insect ontologies and has potential applications spanning systematic biology and biodiversity informatics, allowing users to (1) use controlled vocabularies and create semi-automated computer-parsable insect morphological descriptions; (2) integrate insect morphology into broader fields of research, including ontology-informed phylogenetic methods, logical homology hypothesis testing, evo-devo studies, and genotype to phenotype mapping; and (3) automate the extraction of morphological data from the literature, enabling the generation of large-scale phenomic data, by facilitating the production and testing of informatic tools able to extract, link, annotate, and process morphological data. This descriptive model and its ontological applications will allow for clear and semantically interoperable integration of arthropod phenotypes in biodiversity studies.
more » « less
Full Text Available
Enhanced monography in a collaboratively evolved hub for systematic biology

https://doi.org/10.18061/bssb.v1i1.8340

Girón, Jennifer C.; Valderrama, Eugenio; O'Connor, Patrick M.; Simmons, Nancy B.; Paul, Deborah L.; Yoder, Matthew J. (January 2022, Bulletin of the Society of Systematic Biologists)

No abstract available.
more » « less
Full Text Available
What Can You Do with a TaxonWorks API?

https://doi.org/10.3897/biss.4.59170

Yoder, Matthew; Pereira, Hernán; Pereira, José Luis; Dmitriev, Dmitry; Ower, Geoffrey; Flood, James (October 2020, Biodiversity Information Science and Standards)
null (Ed.)
TaxonWorks is a web-based workbench facilitating curation of a broad cross-section of biodiversity informatics concepts. Its development is currently led by the Species File Group. TaxonWorks has a large, JSON serving, application programming interface (API). This API is slowly being exposed for external use. The API is documented at https://api.taxonworks.org. Here we highlight some existing key features of the API focusing on the TaxonWorks concepts of People, Sources, Collection Objects, Taxon Names, and Downloads and provide a brief roadmap for upcoming additions. Highlights include the ability for data curators to produce shareable bibliographies, DarwinCore Archives (DwC-A), and Catalogue of Life-formatted datasets, access their nomenclature as autocompletes and via many filter facets, share Person metadata including numerous identifier types, and perform basic Geo-JSON and simple DwC-A parameter-based filtering on Collection Objects. As examples of what can be done with the API, we provide several visualizations that are straightforward to implement by those with basic R, Python, Javascript, or Ruby programming skills.
more » « less
Full Text Available
PARAMO: A Pipeline for Reconstructing Ancestral Anatomies Using Ontologies and Stochastic Mapping

https://doi.org/10.1093/isd/ixz009

Tarasov, Sergei; Mikó, István; Yoder, Matthew Jon; Uyeda, Josef C; Boudinot, Brendon (November 2019, Insect Systematics and Diversity)

Abstract Comparative phylogenetics has been largely lacking a method for reconstructing the evolution of phenotypic entities that consist of ensembles of multiple discrete traits—entire organismal anatomies or organismal body regions. In this study, we provide a new approach named PARAMO (PhylogeneticAncestralReconstruction ofAnatomy byMappingOntologies) that appropriately models anatomical dependencies and uses ontology-informed amalgamation of stochastic maps to reconstruct phenotypic evolution at different levels of anatomical hierarchy including entire phenotypes. This approach provides new opportunities for tracking phenotypic radiations and evolution of organismal anatomies.
more » « less
Full Text Available
TaxonWorks 1.0?

https://doi.org/10.3897/biss.3.37374

Yoder, Matthew; Dmitriev, Dmitry; Pereira, José Luis; Flood, James; Tucker, James; Pereira, Hernán; Beckman, Marilyn (June 2019, Biodiversity Information Science and Standards)

TaxonWorks is an open-source workbench for biodiversity researchers. With several years of development behind it, we highlight its present status, and discuss if and when it makes sense to release a version 1.0, i.e. software completed to specific stage. TaxonWorks' scope is broad; it seeks to touch nearly all areas that might be of interest to taxonomists, i.e. those who integrate everything that is known about a taxon into a single resource. Its role as a software platform is placed in a broader context, where many instances of TaxonWorks each can support multiple research projects. Instances may be supported by individuals or organizations. A suite of technical tools including containerization and unit tests facilitate collaboration at many different levels. TaxonWorks is a research tool, mechanisms for analyzing the results of data curation including its application programing interface are described. The long-term development of TaxonWorks is supported by an endowment to the Species File Group. Its source is available on Github.
more » « less
Full Text Available
TaxonWorks: A Use Case in Documenting Complex Biological Relationships

https://doi.org/10.3897/biss.2.25723

Trivellone, Valeria; Dietrich, Christopher H.; Dmitriev, Dmitry; Yoder, Matthew (May 2018, Biodiversity Information Science and Standards)

Compilation and retrieval of reliable data on biological interactions is one of the critical bottlenecks affecting efficiency and statistical power in testing ecological theories. TaxonWorks, a web-based workbench, can facilitate such research by enabling the digitization of complex biological interactions involving multiple species, individuals, and trophic levels. These data can be further organized into spatial and temporal axes, and annotated at the level of individual or grouped interactions (e.g. singularly citing the combined elements of a tritrophic interaction). The simple, customizable nature of tools ultimately reduces the time-consuming steps of data gathering, cleaning, and formatting of datasets for subsequent exploration and analysis while also improving the asserted semantics. An example use case is provided with a dataset of associations among plants, pathogens and insect vectors. The curated data are accessed through the JSON serving TaxonWorks API (Application Programming Interface) by an R package. Analysis and visualization of the network graphs persisted in TaxonWorks is demonstrated using core R functionality and the igraph package (Csardi and Nepusz 2006). TaxonWorks is open-source, collaboratively built software available at http://taxonworks.org.
more » « less
Full Text Available
TaxonWorks

https://doi.org/10.3897/tdwgproceedings.1.20279

Yoder, Matthew; Dmitriev, Dmitry (August 2017, Proceedings of TDWG)

Full Text Available

« Prev Next »